126 research outputs found

    Multi-GPU Acceleration of the iPIC3D Implicit Particle-in-Cell Code

    Full text link
    iPIC3D is a widely used massively parallel Particle-in-Cell code for the simulation of space plasmas. However, its current implementation does not support execution on multiple GPUs. In this paper, we describe the porting of iPIC3D particle mover to GPUs and the optimization steps to increase the performance and parallel scaling on multiple GPUs. We analyze the strong scaling of the mover on two GPU clusters and evaluate its performance and acceleration. The optimized GPU version which uses pinned memory and asynchronous data prefetching outperform their corresponding CPU versions by 5-10x on two different systems equipped with NVIDIA K80 and V100 GPUs.Comment: Accepted for publication in ICCS 201

    A body at the edge of language: writing anorexia, bulimia and recovering

    Get PDF
    This practice-led life writing project explores this writer-scholar's experience of her eating disorder through a series of poetic essays developed from material and somatic writing methods including ink-and-paper, found text, and movement. Through these particular methods, and the episodic acts of the writing itself, this PhD discovers a form of somatic life writing that both demonstrates and analyses the lived experience of this psycho-somatic disorder. This research project responds to the challenges of writing anorexia, bulimia and recovering, by developing material writing methods to negotiate self-erasure, narrative authority and embodied memory on the page. The PhD examines the symbiotic relation between writing and (not) eating in ways that are analogous, metaphoric and mutually affective. It draws on a range of writers and feminist materialist scholars to propose that when the tensions of eating disorder are transposed to language and navigated on the page, moments can be found where bodies and writing are constituted and de-constituted. In locating their life-affirming entanglement, this writing practice counteracts the erasure and containment of the condition

    NVIDIA Tensor Core Programmability, Performance & Precision

    Full text link
    The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called "Tensor Core" that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta microarchitecture, provides 640 Tensor Cores with a theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to program NVIDIA Tensor Cores, their performances and the precision loss due to computation in mixed precision. Currently, NVIDIA provides three different ways of programming matrix-multiply-and-accumulate on Tensor Cores: the CUDA Warp Matrix Multiply Accumulate (WMMA) API, CUTLASS, a templated library based on WMMA, and cuBLAS GEMM. After experimenting with different approaches, we found that NVIDIA Tensor Cores can deliver up to 83 Tflops/s in mixed precision on a Tesla V100 GPU, seven and three times the performance in single and half precision respectively. A WMMA implementation of batched GEMM reaches a performance of 4 Tflops/s. While precision loss due to matrix multiplication with half precision input might be critical in many HPC applications, it can be considerably reduced at the cost of increased computation. Our results indicate that HPC applications using matrix multiplications can strongly benefit from using of NVIDIA Tensor Cores.Comment: This paper has been accepted by the Eighth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 201

    TensorFlow Doing HPC

    Full text link
    TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HPC applications. However, very few experiments have been conducted to evaluate TensorFlow performance when running HPC workloads on supercomputers. This work addresses this lack by designing four traditional HPC benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG) solver and Fast Fourier Transform (FFT). We analyze their performance on two supercomputers with accelerators and evaluate the potential of TensorFlow for developing HPC applications. Our tests show that TensorFlow can fully take advantage of high performance networks and accelerators on supercomputers. Running our TensorFlow STREAM benchmark, we obtain over 50% of theoretical communication bandwidth on our testing platform. We find an approximately 2x, 1.7x and 1.8x performance improvement when increasing the number of GPUs from two to four in the matrix-matrix multiply, CG and FFT applications respectively. All our performance results demonstrate that TensorFlow has high potential of emerging also as HPC programming framework for heterogeneous supercomputers.Comment: Accepted for publication at The Ninth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES'19

    Signatures of Secondary Collisionless Magnetic Reconnection Driven by Kink Instability of a Flux Rope

    Full text link
    The kinetic features of secondary magnetic reconnection in a single flux rope undergoing internal kink instability are studied by means of three-dimensional Particle-in-Cell simulations. Several signatures of secondary magnetic reconnection are identified in the plane perpendicular to the flux rope: a quadrupolar electron and ion density structure and a bipolar Hall magnetic field develop in proximity of the reconnection region. The most intense electric fields form perpendicularly to the local magnetic field, and a reconnection electric field is identified in the plane perpendicular to the flux rope. An electron current develops along the reconnection line in the opposite direction of the electron current supporting the flux rope magnetic field structure. Along the reconnection line, several bipolar structures of the electric field parallel to the magnetic field occur making the magnetic reconnection region turbulent. The reported signatures of secondary magnetic reconnection can help to localize magnetic reconnection events in space, astrophysical and fusion plasmas

    Nonlinear evolution of the magnetized Kelvin-Helmholtz instability: from fluid to kinetic modeling

    Full text link
    The nonlinear evolution of collisionless plasmas is typically a multi-scale process where the energy is injected at large, fluid scales and dissipated at small, kinetic scales. Accurately modelling the global evolution requires to take into account the main micro-scale physical processes of interest. This is why comparison of different plasma models is today an imperative task aiming at understanding cross-scale processes in plasmas. We report here the first comparative study of the evolution of a magnetized shear flow, through a variety of different plasma models by using magnetohydrodynamic, Hall-MHD, two-fluid, hybrid kinetic and full kinetic codes. Kinetic relaxation effects are discussed to emphasize the need for kinetic equilibriums to study the dynamics of collisionless plasmas in non trivial configurations. Discrepancies between models are studied both in the linear and in the nonlinear regime of the magnetized Kelvin-Helmholtz instability, to highlight the effects of small scale processes on the nonlinear evolution of collisionless plasmas. We illustrate how the evolution of a magnetized shear flow depends on the relative orientation of the fluid vorticity with respect to the magnetic field direction during the linear evolution when kinetic effects are taken into account. Even if we found that small scale processes differ between the different models, we show that the feedback from small, kinetic scales to large, fluid scales is negligable in the nonlinear regime. This study show that the kinetic modeling validates the use of a fluid approach at large scales, which encourages the development and use of fluid codes to study the nonlinear evolution of magnetized fluid flows, even in the colisionless regime

    Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs

    Full text link
    Analytic, first-principles performance modeling of distributed-memory parallel codes is notoriously imprecise. Even for applications with extremely regular and homogeneous compute-communicate phases, simply adding communication time to computation time does often not yield a satisfactory prediction of parallel runtime due to deviations from the expected simple lockstep pattern caused by system noise, variations in communication time, and inherent load imbalance. In this paper, we highlight the specific cases of provoked and spontaneous desynchronization of memory-bound, bulk-synchronous pure MPI and hybrid MPI+OpenMP programs. Using simple microbenchmarks we observe that although desynchronization can introduce increased waiting time per process, it does not necessarily cause lower resource utilization but can lead to an increase in available bandwidth per core. In case of significant communication overhead, even natural noise can shove the system into a state of automatic overlap of communication and computation, improving the overall time to solution. The saturation point, i.e., the number of processes per memory domain required to achieve full memory bandwidth, is pivotal in the dynamics of this process and the emerging stable wave pattern. We also demonstrate how hybrid MPI-OpenMP programming can prevent desirable desynchronization by eliminating the bandwidth bottleneck among processes. A Chebyshev filter diagonalization application is used to demonstrate some of the observed effects in a realistic setting.Comment: 18 pages, 8 figure

    Kinetic simulations of magnetic reconnection in presence of a background O+ population

    Full text link
    Particle-in-Cell simulations of magnetic reconnection with an H+ current sheet and a mixed background plasma of H+ and O+ ions are completed using physical mass ratios. Four main results are shown. First, the O+ presence slightly decreases the reconnection rate and the magnetic reconnection evolution depends mainly on the lighter H+ ion species in the presented simulations. Second, the Hall magnetic field is characterized by a two-scale structure in presence of O+ ions: it reaches sharp peak values in a small area in proximity of the neutral line, and then decreases slowly over a large region. Third, the two background species initially separate in the outflow region because H+ and O+ ions are accelerated by different mechanisms occurring on different time scales and with different strengths. Fourth, the effect of a guide field on the O+ dynamics is studied: the O+ presence does not change the reconnected flux and all the characteristic features of guide field magnetic reconnection are still present. Moreover, the guide field introduces an O+ circulation pattern between separatrices that enhances high O+ density areas and depletes low O+ density regions in proximity of the reconnection fronts. The importance and the validity of these results are finally discussed
    corecore